{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (a) prove Bellman update operator is a $\\gamma$-contraction in the max-norm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose\n",
    "\n",
    "$$\\arg\\max_{s \\in S} \\left|V_1(s) - V_2(s) \\right| = s^* $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "so \n",
    "\n",
    "$$\n",
    "\\left|\\left| V_1 - V_2 \\right| \\right|_{\\infty} = V_1(s^*) - V_2(s^*)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, suppose\n",
    "\n",
    "$$\\arg\\max_{s \\in S} \\left|B(V_1)(s) - B(V_2)(s) \\right| = s^+ $$\n",
    "\n",
    "Note that $B(V)$ is considered a function of $s$, i.e. $B(V) = V'(s) = B(V)(s)$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "so \n",
    "\n",
    "\\begin{align*}\n",
    "\\left|\\left|B(V_1) - B(V_2)\\right|\\right|_{\\infty} \n",
    "&= B(V_1)(s^+) - B(V_2)(s^+) \\\\\n",
    "&= \\gamma \\bigg(\\max_{a \\in A} \\sum_{s' \\in S} p_{s^+a}(s')V(s') - \\max_{a \\in A} \\sum_{s' \\in S} p_{s^+a}(s')V(s') \\bigg) \n",
    "\\end{align*}\n",
    "\n",
    "Note $R(s^+)$ from $B(V_1)$ and $B(V_2)$ are cancelled."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\left|\\left|B(V_1) - B(V_2)\\right|\\right|_{\\infty} \n",
    "&= \\left| B(V_1)(s^+) - B(V_2)(s^+) \\right| \\\\\n",
    "&= B(V_1)(s^+) - B(V_2)(s^+)  \\\\\n",
    "&= \\gamma \\bigg(\\max_{a \\in A} \\sum_{s' \\in S} p_{s^+a}(s')V_1(s') - \\max_{a \\in A} \\sum_{s' \\in S} p_{s^+a}(s')V_2(s') \\bigg) \\\\\n",
    "&= \\gamma \\bigg(\\sum_{s' \\in S} p_{s^+a_1}(s')V_1(s') - \\sum_{s' \\in S} p_{s^+a_2}(s')V_2(s') \\bigg) \\\\\n",
    "&= \\gamma \\sum_{s' \\in S} \\bigg(p_{s^+a_1}(s')V_1(s') - p_{s^+a_2}(s')V_2(s') \\bigg ) \\\\\n",
    "&\\le \\gamma \\sum_{s' \\in S} \\bigg(p_{s^+a_2}(s')V_1(s') - p_{s^+a_2}(s')V_2(s') \\bigg ) \\\\\n",
    "&= \\gamma \\sum_{s' \\in S} p_{s^+a_2}(s') \\bigg(V_1(s') - V_2(s') \\bigg ) \\\\\n",
    "&\\le \\gamma \\bigg(V_1(s^*) - V_2(s^*) \\bigg ) \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* The 2nd equality presumes $B(V_1)(s^+) \\ge B(V_2)(s^+)$, since $V_1$ and $V_2$ have no ordering, the labels can be exchanged without consequence.\n",
    "* The 4th equality supposes $a_1$ and $a_2$ satisfy the $\\max_{a \\in A}$ operator in $B(V_1)$ and $B(V_2)$, respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* The 1st inequality is a result of $\\max_{a \\in A}$, change $a_1$ to $a_2$ will make $\\sum_{s' \\in S} p_{s^+a}(s')V_1(s')$ smaller\n",
    "* The 2nd inequality is a result of $\\sum_{s' \\in S} p_{s^+a_2}(s') \\bigg(V_1(s') - V_2(s') \\bigg )$ being a weighted average of $V_1(s') - V_2(s')$ over $s'$ while $V_1(s^*) - V_2(s^*)$ is the maximum difference following the definition of $s^*$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**About remark**: set $V_1$ to $V^0$, the initial value function, and $V_2$ to $V^*$, the optimal value function, "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\left|\\left|B(V^0) - B(V^*)\\right|\\right|_{\\infty} \n",
    "= \\left|\\left|V^1 - V^*\\right|\\right|_{\\infty} \\le \\gamma \\left|\\left|V^0 - V^*\\right|\\right|_{\\infty} \n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first equality is the result of convergence of the optimal value function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose the value function after $k$ iteration is $V^{k}$, then\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\left|\\left|B(V^{k}) - V^*\\right|\\right|_{\\infty} \n",
    "&\\le \\gamma \\left|\\left|V^{k+1} - V^*\\right|\\right|_{\\infty} \\\\\n",
    "&\\le \\gamma^2 \\left|\\left|V^{k} - V^*\\right|\\right|_{\\infty} \\\\\n",
    "&\\cdots \\\\\n",
    "&\\le \\gamma^{k + 2} \\left|\\left|V^0 - V^*\\right|\\right|_{\\infty} \n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hence,\n",
    "\n",
    "\\begin{align*}\n",
    "\\frac{\\left|\\left|V^{k} - V^*\\right|\\right|_{\\infty}}{\\left|\\left|V^0 - V^*\\right|\\right|_{\\infty}} \\le \\gamma^{k} \\\\\n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# (b) Prove $B$ has at most one fixed point"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll use proof by contradiction:\n",
    "\n",
    "Assume there are two fixed point $V_1$ and $V_2$, then"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\\begin{align*}\n",
    "\\left|\\left|B(V_1) - B(V_2)\\right|\\right|_{\\infty} \n",
    "&= \\left|\\left|V_1 - V_2 \\right|\\right|_{\\infty} \\\\\n",
    "&\\le \\gamma \\left|\\left|V_1 - V_2 \\right|\\right|_{\\infty} \n",
    "\\end{align*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 1st equality follows the definition of a fixed point. The inequality follows the result of the $(a)$, but it **cannot** be satisfiyed when $\\gamma < 1$, so the result is contradictory with the assumption. Therefore, there is at most only one fixed point."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
